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Model selection is important in any statistical analysis, and the primary goal is to 
find the preferred (or most parsimonious) model, based on certain criteria, from a 
set of candidate models given data. Several recent publications have employed the 
deviance information criterion (DIC) to do model selection among different forms 
of multilevel item response theory models (MLIRT). The majority of the practition- 
ers use WinBUGS for implementing MCMC algorithms for MLIRT models, and the 
default version of DIC provided by WinBUGS focused on the measurement-level pa- 
rameters only. The results herein show that this version of DIC is inappropriate. 
This study introduces five variants of DIC as a model selection index for MLIRT 
models with dichotomous outcomes. Considering a multilevel IRT model with three 
levels, five forms of DIC are formed: first-level conditional DIC computed from the 
measurement model only, which is the index given by many software packages such 
as WinBUGS; second-level marginalized DIC and second-level joint DIC computed 
from the second-level model; and top-level marginalized DIC and top-level joint 
DIC computed from the entire model. We evaluate the performance of the five model 
selection indices via simulation studies. The manipulated factors include the number 
of groups, the number of second-level covariates, the number of top-level covariates, 
and the types of measurement models (one-parameter vs. two-parameter). Consid- 
ering the computational viability and interpretability, the second-level joint DIC is 
recommended for MLIRT models under our simulated conditions. 


Model selection is important in any model-based inference. Taking item response 
theory (IRT) model as an example, the selection of a misspecified model leads to 
not only theoretically different interpretations of the data but also inappropriate con- 
clusions with respect to other IRT applications such as biased parameter estimation, 
differential item functioning (DIF), or inappropriate person-fit assessment (DeMars, 
2010). 

The ease of fitting hierarchical models using Markov chain Monte Carlo (MCMC) 
algorithm has facilitated the development of model selection criterion within 
Bayesian framework. Congdon (2003) provided versions of Akaike’s (1974) infor- 
mation criterion (AIC) and Schwarz’s (1978) Bayesian information criterion (BIC) 
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when the fully Bayesian estimation methods, such as the MCMC algorithm, are used. 
Both AIC and BIC are based on the likelihood with a penalty, and their difference 
lies on the penalty term, which depends on the effective number of parameters in 
the model. This effective number is a measure of model complexity, which is often 
difficult to calculate for hierarchical models. This is because, although the number 
of parameters follows directly from the likelihood, the prior distribution imposes ad- 
ditional restrictions on the parameter space and it reduces the effective dimension 
(Entink, Fox, & van der Linden, 2009; Fox, 2010). 

Within the Bayesian framework, a common approach for comparing two models 
is to compute the Bayes factor (BF; Berger & Delampady, 1987; Gelfand, 1996; 
Jeffreys, 1961; Kass & Raftery, 1995), which is defined as the ratio of the posterior 
probabilities of two models given data. Supposing that the prior densities of both 
models consist of a point mass at their respective MLEs (maximum likelihood 
estimates), and replacing the posterior probabilities by the likelihood of the two 
model parameters evaluated at their respective MLEs, the Bayesian factor becomes 
the classical likelihood ratio (Ando, 2010). Kass and Wasserman (1995) showed that 
under certain conditions the BIC was an approximation of the BF. An advantage 
of the BF is its clear interpretation of the change in the odds in favor of the model 
when moving from the prior to the posterior distribution (Lavine & Schervish, 
1999). Unfortunately, BF is quite difficult both to compute and to interpret for high- 
dimensional hierarchical models and for models having improper prior distributions. 

As a remedy, Geisser and Eddy (1979) discussed cross-validation in Bayes re- 
gression model comparison and proposed a so-called pseudo-Bayes factor (PsBF). 
The PsBF uses a “leave-one-out” method to calculate the cross-validation predictive 
densities (Gelfland & Dey, 1994) so that it can avoid intractable computation and 
dependence on the prior. The pseudo-marginal likelihood used here may be inter- 
preted as a predictive measure for a future replication of the given data (Ando & 
Tsay, 2010). The PsBF is provided by the ratio of two such quantities from the two 
competing models. This predictive density yields the conditional predictive ordinates 
(CPO) index such that PsBF can be expressed as the ratio of two CPO indices. One 
drawback of PsBF, as noted by Eklund and Karlsson (2007), is that the division of 
the data into subsets may affect the results. Yet there exist no clear guidelines for the 
division, and the approach is difficult to apply when the data are dependent, as in, for 
instance, time series data. When the number of observations is large, the approach 
consumes a substantial amount of computational time (Ando & Tsay, 2010). 

In addition to BF, Spiegelhalter, Best, Carlin, and Van Der Linde (2002) developed 
the deviance information criterion (DIC) as a measure of global model fit, which is 
computed based on Bayesian posterior estimates of model parameters. DIC is usu- 
ally viewed as the Bayesian counterpart of AIC, which is approximately equivalent 
to AIC for models with negligible prior information, and it is easily obtained as a 
byproduct of the MCMC sampling algorithm. Further, it also makes weaker assump- 
tions and automatically penalizes model complexity (Bolker et al., 2009). Despite 
these advantages, there still exist many weaknesses of DIC, as discussed by Spiegel- 
halter et al. (2002) and comments therein and by a series of articles by Celeux, 
Forbes, Robert, and Titterington (2006), Carlin (2006), Meng and Vaida (2006), and 
Plummer (2006). Because DIC is widely used for hierarchical/multilevel models, the 
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main criticism is that DIC is dependent on the level of parameter specification upon 
which the model likelihood is conditioned (i.e., “parameter of focus’) and hence 
lacks invariance to reparameterization. Millar (2009) presented three variants of DIC 
for a three-level hierarchical model: (1) the conditional DIC, which used likelihood 
conditioned on parameters at the lowest level of the hierarchy, (2) the second-level 
marginalized DIC, which used partially marginal likelihood conditioned on parame- 
ters at the first and second levels, and (3) the top-level marginalized DIC, which used 
marginal likelihood conditioned on parameters at all levels. Hamaker, van Hattum, 
Kuiper, and Hoijtink (2011) introduced the conditional DIC and marginal DIC; the 
marginal DIC is similar to Millar’s second-level marginalized DIC. 

This article focuses on the model selection for the family of multilevel IRT mod- 
els (e.g., Goldstein, 2003; Raudenbush & Bryk, 2002), which are commonly used to 
model nested structures in behavioral and social sciences with categorical outcomes. 
Many researchers have used different evaluation criteria to evaluate the global fit 
of multilevel models. For example, Entink et al. (2009) have used the conditional 
DIC provided in WinBUGS (Spiegelhalter, Thomas, Best, & Lunn, 2003) to assess 
the global fit of a multilevel item response theory model (MLIRT), and other sim- 
ilar applications include Hamaker et al. (2011), Hung and Wang (2012), and Choi 
and Wilson (2016). Fox (2010), in contrast, used the top-level marginalized DIC to 
assess the fit of a MLIRT model with an application to the PISA (Program for In- 
ternational Student Assessment) data. Geering, Glas, and van der Linden (2011) and 
Wang, Chang, and Douglas (2013) used DIC calculated from the joint likelihood 
(without any marginalization) for a linear item cloning model and semi-parametric 
hierarchical linear transformation model, respectively. This type of DIC, however, 
does not fall into any of the three variants of DIC described by Millar (2009). 

Furthermore, Hung and Wang (2012) have used BF to compare the generalized 
multilevel facets model for longitudinal data, and BF was also used by Entink et al. 
(2009). Choi and Wilson (2016) used the posterior predictive model check (PPMC; 
Gelman, Carlin, Stern, & Rubin, 1996; Guttman, 1967) method to investigate the 
effect of incorrect modeling school membership in the analysis of multilevel and 
longitudinal item response data. 

Other information-based indices, such as the average of AIC and BIC, which 
were computed from the post burn-in iterations of MCMC algorithms, were used 
by Cho and Cohen (2010); AIC and the quasi-information criterion (QIC) were used 
by Barnett, Koper, Dobson, Schmiegelow, and Manseau (2010) and compared with 
DIC, and they recommended DIC; conditional AIC computed from the measure- 
ment model, marginal AIC computed from the measurement model and second level 
model, and BIC were used by Hamaker et al. (2011). 

As noted by Hamaker et al. (2011), more often than not the selection of a model 
fit evaluation criterion is based on the capacity of the software. For example, the DIC 
provided in WinBUGS by default is a conditional DIC computed from the level-1 
model, whereas the AIC and BIC from MLwiN (Rasbash et al., 2000), SPSS, R, and 
Mplus (Muthén & Muthén, 2010) are based on the marginal likelihood computed 
from the first and the second-level models. 

Studies related to evaluating the performance of different variants of DIC are 
scarce. To the authors’ best knowledge, there is only one study (Millar, 2009), that 
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used three variants of marginalized DIC to compare the hierarchical Bayesian mod- 
els on over-dispersed count data, and the second-level marginalized DIC was rec- 
ommended. However, this recommendation may not hold for binary item response 
data. Moreover, there were no systematic investigations comparing the performance 
of marginalized DIC and joint DIC. Without integrating out random effects at dif- 
ferent levels of hierarchy, the joint DIC, in theory, should be computationally much 
easier than the marginalized DIC. It is yet to be evaluated in the present study to 
check if joint DIC can still select the best-fitting model under various conditions. 
The current study focuses on multilevel IRT (MLIRT) models and is designed 
to investigate the performance of five variants of DIC: the first-level conditional 
DIC (DIC), the second-level marginalized DIC (DICs), the second-level joint DIC 
(DICjs), the top-level marginalized DIC (DIC7), and the top-level joint DIC (DICj,). 
The outline of the rest of the article is as follows. In Section 2, we briefly review 
the multilevel IRT models. In Section 3, we present five variants of DIC as model 
selection criteria for MLIRT models. Simulation studies are provided in Section 4 
to illustrate the performance of these five variants of DIC. A real-data analysis is 
provided in Section 5. We end with some concluding remarks in Section 6. 


Model Description 


In this section, we give a brief overview of the MLIRT model that is considered 
in this article. Interested readers can refer to Fox (2010) for a full description of the 
family of MLIRT models. The MLIRT model consists of two components. 


Measurement Model: Level 1 


In this article, the probit link is considered as the linking function, so that the 
posterior distributions of item parameters and latent trait (i.e., ability) have closed 
forms, which facilitate computing marginal likelihoods. And the two-parameter nor- 
mal ogive (2PNO) model and one-parameter normal ogive (1PNO) model (Baker 
& Kim, 2004; Embretson & Reise, 2000; Hambleton & Swaminathan, 1985; Lord, 
1980) are considered as the measurement models, because Kang and Cohen (2007) 
reported that DIC did not work well with data from the three-parameter models. Let 


yijk denote the binary scored response of examinee i (i= 1, ..., nj) in group j (j = 1, 
...,J) on item k (k= 1, ..., K); the probability of a correct response is given by 
P(yije = 1|8;j, ax, De) = (a0 — bx), (1) 


where ®(.) denotes the standard normal cumulative distribution function, and a, and 
b, are the discrimination and difficulty parameters of item k, set a, = | when 1PNO 
model is used. 6;; denotes the ability of examinee i in group j. Hereafter, the parame- 
ters of item k will also be succinctly denoted by a 2-by-1 vector, &;, that is, &, = (ax, 
b,)’. 


Structural Multilevel Model: Level 2 and Level 3 


The structural multilevel model explains the relations between the latent variables 
and other observed variables: 
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Level 2 
8 = Boj + Bijxiy + +++ + Bai rqy + +> + Bajxoy + ei, (2) 
Level 3 
Boj = Yoo + Yor@1j +++: + Yos@sj; + Uoj; 
Bij = Viotyu@i +--+: +yis@sj + u1;, 
Boj = Yoo + Yar@1j +:+* + Yos@sj + Ug;- (3) 
In level 2, x4; denotes the gth (¢ = 0, ..., Q) individual-specific covariate of ex- 


aminee i in group j, such as socioeconomic status or gender. B,; is the corresponding 
regression coefficient, and e;; denotes the random effect at an individual level and is 
assumed to follow a normal distribution with a constant variance, that is, e;, ~ N(O, 
o ). In level 3, w,; denotes the sth (s = 0, ..., S) school-specific covariate of group 
j, such as teacher satisfaction or school climate. y,; is the corresponding regression 
coefficient, u,; denotes the random effect at the school level and u; ~ N(O, T), where 
To 77 OO 
T=]: -. : | isa Q-by-Q covariance matrix. 
OY aes wD 0 
Some restrictions are imposed to remove the scale indeterminacy inherent in nor- 
mal ogive models. Two sets of constraints are usually adopted in the literature. One is 
to fix the scale of the ability to a standard normal distribution. As a result, the struc- 
tural multilevel IRT model in Equations 2 and 3 is identified owing to the fixed scale 
of the abilities. Another way is to put a restriction on the item parameters, which can 
be accomplished by imposing the restriction a; = 1 and b; = 0. In this article, we 
take the second approach. 
The Gibbs sampler (Albert, 1992; Fox & Glas, 2001) is used for MLIRT model 
estimation. The details are provided in Appendix 1. 


Model Selection Methods 


In this section, we introduce five variants of the DIC of Spiegelhalter et al. (2002) 
that will be used for hierarchical models. The generic form of DIC (Spiegelhalter 
et al., 2002) is expressed as 


DIC = 2 D(®) —D(6), (4) 


where D(©) = —2 log{p(y|©)} + 2 log{p(y)} denotes Bayesian deviance, and the 
overline denotes posterior expectation. In Equation 4, y is the data matrix, and the de- 
viance is conditional on parameter vector ©, which is termed the “parameter of inter- 
est” or “parameter in focus” by Spiegelhalter et al. (2002). Here, it is sufficient to as- 
sume that the standardizing factor p(y) equals one such that D(@) = —2 log{p(y|©)} 
(Fox, 2010). The number of effective model parameters pp equals D(@) — D(©) ; 
Finally, the best-fitting model is associated with the smallest DIC value. 


School j 


v v v 
DIC; & DICj; DIC, & DIC/s DIC; 


Figure J, An illustration of five variants of DIC from a MLIRT model. (Color 
figure can be viewed at wileyonlinelibrary.com) 


Figure | shows the parameters of focus in these five variants of DIC from a MLIRT 
model. Three boxes are plotted, implying that items are nested within individuals, 
whereas individuals are nested within schools. Different variants of DIC are marked 
graphically in this figure. For example, DICc only focuses on the lowest measure- 
ment model level, whereas DICs and DICjs focus on the second level, and the last 
two focus on the top level. Observed variables are in squares, latent variable and 
parameters are in circles. 

As the Gibbs sampler is used to fit the MLIRT model, after augmenting the discrete 
data y with the continuous data z, the DIC in Equation 4 can be expressed as the 
integrated augmented DIC; that is, 


DIC = f [DIC 2, ©}: p@, Oly)dad@ 
= | Poa. ©) — D(z, ©)]- p(, Oly) dzdo 


= | {-4F 02 {log [p (z|@)]} + 2log[p (z|O)]}- p@, Oly) dzd@, (5) 


where p(z|®) is the augmented likelihood function, © is the matrix of the parame- 
ters, and @ is the point estimated value based on MCMC samples. Zijk Can be sampled 
from the full conditional posterior density: 


N(a;.0;; — b;, 1) truncated at the left by 0 if yj, = 1 
N(ax9i — bx, 1) truncated at the right by 0 if yj, = 0° 


Zijk| 9,5, ~ (6) 
The detailed calculations of the augmented likelihood for five versions of DIC are 
presented in Appendix 2. Hereafter, we focus the definition calculation of D(z, ©), 
instead of D(O). 
In the present context, it is computationally most convenient to take © as the 


parameters in the lowest (i.e., measurement) level of the hierarchy, as specified by 
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Equation 1. This will be referred to as the conditional model and the corresponding 
deviance is 


D(z, 9, §) = —2 log{p(z|9, § )}. (7) 


The DIC calculated using the deviance defined in Equation 7 is the first-level con- 
ditional DIC (DICc; Millar, 2009). The first-level conditional DIC is defined simi- 
larly as the DIC used by Entink et al. (2009) and Fox (2010). Millar (2009) men- 
tioned that the first-level conditional DIC ignored the immediate information pro- 
vided by the higher-level structures, although the estimates of € and @ may still carry 
the higher-level information indirectly, therefore it was not sensitive to differenti- 
ate models that differ in higher levels. Even so, given its computational ease, it is 
still widely used and it is actually the default option in WinBUGS. Also note that 
the first-level conditional DIC is built on the “complete-data” likelihood from level 
1 model, assuming random effect at level 1 (e.g., 8) is known by plugging in their 
estimated values. In so doing, it treats the estimated random effects as fixed values 
when computing DIC. 

Furthermore, the parameters at the second level can sometimes be of interest when 
one intends to evaluate whether certain individual level (1.e., level 2) covariates have 
significant effects. Both marginal likelihood and joint likelihood are considered here. 
The second-level marginalized deviance D(z, é, B, o”) is defined as 


D(z, &, 0°, B) = —2 log{ p(z|&, 0° , By}. (8) 


Fox’s method (Fox, 2010) for calculating marginalized Bayesian deviance and 
Chib’s method (Chib, 1995) for calculating marginal densities are applied here. The 
idea is to obtain a closed-form expression of the marginal likelihood using Bayes’s 
formula. Hence, the corresponding second-level marginalized distribution has the 
following density: 


&,0°,B) 


pt / pag, 0)p(6 |o2, Bde 


p(Z, 9 


_ €,07,B) 7 p(z|&, 0)p(0|o*, B) 
p@ 


z,€,07,B) — p(®|z, &, 02, B) 


(9) 


The three parts on the right side of Equation 9 all have closed forms and the full 
computation details are provided in Appendix 2. The DIC calculated using the de- 
viance in Equation 8 is the second-level marginalized DIC (DICs; Millar, 2009). As 
compared to the complete-data likelihood used in the first-level conditional DIC, this 
form of DIC uses the second-level marginalized likelihood (see Equation 9), which 
can be viewed as an observed likelihood from model level 2. The “missing” data of @ 
is marginalized, and hence the second-level marginalized DIC belongs to the family 
of “observed” DICs (Celeux et al., 2006). 

The second-level joint deviance D; (z, €, 0”, B) is defined as 


D; (z, &, 0°, B) = —2 log {p (z, 0|&, 0°, B)}, (10) 
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and then the corresponding second-level joint distribution is given by 


p (z,0|&,0°,B) = p(zlé, 8) p (Oo, B). (11) 


The DIC calculated using the deviance in Equation 10 will be called the second- 
level joint DIC (DICjs), which can be treated as one kind of complete DIC (Celeux 
et al., 2006) because it is again based on complete-data likelihood. The unobserved, 
missing random effects are treated as known. Actually it is a semi-complete DIC 
because there is no immediate information from the highest level, that is, level 3. 

It is also possible to calculate a top-level marginalized DIC or top-level joint DIC 
when the interest is focused on the significant school level (i.e., level 3) effects. The 
top-level marginalized deviance is given by 


D (2, &, 0°, y, T) = —2log {p (z|&, 0°, y, T)}. (12) 


Based on the Bayes’ formula, the corresponding density can be expressed as 


p(z|é, 0°, y,T) = [ vas. c°. eye. T)dB 


= // p(z|g, 9) plo”, B)pBly, T)dedB 


Ply, T) 


2 
=p). 
P(als, 0°. By lz, §, 07,1 


(13) 


The three parts on the right side of Equation 13 all have closed forms, and the 
computation details will also be provided in Appendix 2. The DIC calculated using 
the deviance in Equation 12 is the top-level marginalized DIC (DIC7; Millar, 2009). 
Similarly, the top-level marginalized DIC is another version of observed DIC (Celeux 
et al., 2006), which can be calculated using an observed likelihood from model level 
2 and level 3. The “missing” data of 8 and 8 are integrated out. 

The top-level joint deviance is 


D; (z,§, 0°, y, T) = —2 log {p (z, 6, B 


and the corresponding likelihood is 


p (z, 9,8 |&,0° ,y, T) = p(zlé, 9) p (0|o*, B) p@ly. T). (15) 


The DIC calculated using the deviance in Equation 14 will be called the top-level 
joint DIC (DICjz). It is a complete DIC, which contains all the information from the 
model. 

As both marginal likelihood and joint likelihood are used frequently in statisti- 
cal analysis/inference (Bjérnstad, 1996), DIC could be formed using either form of 
the likelihood. Celeux et al. (2006) provided a thorough discussion about the perfor- 
mances of different versions of DIC in a missing data model; however, their findings 
may not be easily generalizable to hierarchical models because the missing data (due 
to random effects) can occur at different levels of the model. Hence, our study will 
provide a unique contribution to the literature by comparing the performances of the 
variants of DIC under various conditions often seen in hierarchical models. 


Bo vy, Ths (14) 
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Table 1 
Models Used in Simulation Studies 


Measurement Model Individual-Level Model School-Level Model 
Boj = Yoo + Yor@1j + Uo; 
Model 1 2PNO 93 = Boj + Bijxuy + ei : : - 
i = Boj + Brug + ey Bij = Yio + Yu@1j + U1; 
Boj; = Yoo + Uoj 
Model 2 2PNO 9; = Boj; + Bijxiy + ey 4 : 
uv = Boy + Bury + ey Bij = Yio + 1; 
Model 3 2PNO 8; = Bo; + ei Boj; = Yoo + Uoj 
Model 4 1PNO B= Bo piu Boj; = Yoo + Yor®1; + Uo; 


Bij = Vio t+ Yu@1y + U1; 


Note. The notations are deferred to Equations | to 3. 


In Equations B14 and B15, Hj is a Knj-by-Kn; symmetry matrix. The inverse of 
Hj; should be calculated M x J times when calculating the top-level marginalized 
DIC, where M is the number of interim values from post-burn-in iteration, which are 
used to calculate the posterior mean, and J is the number of groups (see Equation 1). 
When the number of interim values or groups is large, yielding a high-dimensional 
matrix, taking the matrix inverse becomes computationally prohibitive. Therefore, 
the top-level marginalized DIC is computationally much more demanding than other 
variants of DIC. 


Simulation Studies 


In this section, simulation studies are designed to evaluate the performance of the 
five versions of DIC in terms of selecting the correct model. The true multilevel IRT 
model differs by (1) whether significant individual- and/or school-level covariates 
are included; (2) whether 2PNO or IPNO is used as the true measurement model. 
Two simulation studies are performed and they are described in detail below. 


Simulation Design 


For meaningful examination of the behavior of DICs, the simulated data must be 
generated from a plausible model (Sinharay & Stern, 2003).There are four MLIRT 
models to be chosen, named Models | to 4. Table | shows the specifications of the 
four models. 

Tables 2 and 3 show the simulation design for Study 1 and Study 2, respectively. 
There are 36 (2 group size x 2 test length x 3 number of covariates x 3 measurement 
models) conditions in Study 1 and 8 (2 group size x 2 test length x 2 measurement 
models) conditions in Study 2. It is typical to consider one covariate in each level of 
structure models, because if more than one covariate in the higher-level of structure 
models (i.e., level 3 in this study) are considered, the 95% posterior credible inter- 
val (P.C.I.) can be calculated as a variable selection index, which we also report in 
Study |. The Gibbs sampler is used to obtain the estimated values of the parameters. 
The source code is available to readers upon request. 
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Table 2 
Fixed and Manipulated Conditions and Parameter Values in Study 1 


Study 1 Generation Model Model 1 or Model 2 or Model 3 
(Fox & Glas, 2001) Calibration model Model | and Model 2 and Model 3 
Fixed conditions 
Examinee sample size 2,000 
a, Item discrimination a; ~ log N(Wa = exp(1), og = .25) 
by Item difficulty b, ~ N(wp = 0, oy = -5) 
x,; Individual-specific x1; ~ N(wy = 0, oy = 1) 
covariate 


e; Individual-specific random ej ~ N(We = 0, oe = .2) 
effect parameter 

w,; School-specific covariate w,; ~ N(Wo = -5, 60 = 1) 

Yoo = —.30, yor = .15 


School-specific regression 
Y P 8 Yio = 35, Yu= 1.0 


coefficients 
[i |School-specific random [io] ~ 
ij uj 
effect parameters MVN (i = [0] eee [' |) 
Manipulated conditions 
Group size Small: 10 
Large: 200 
Test length Small: 10 
Large:30 
Number of covariates One individual-specific covariate and 


one school-specific covariate 

One individual-specific covariate but no 
school-specific covariate 

No covariate 


Without loss of generality, an additional simulation check was done based on Co- 
hen, Kane, and Kim’s (2001) index to detect the number of replications. Take 0 as 
an example, let r= 1, ..., R denote replications. Cohen et al. (2001) calculated a 
magnitude of the differences between the average of MSE(@) (denoted as AMSE(@)) 
under manipulated conditions. Then under a stringent tolerance criterion, we can 
obtain that 


( Orisa, + Tm, 7 R 
0.1 x |AMSE(6),, — AMSE@),5|’ 


R 


IV 


(16) 


where the subscript “cl” and “c2” denote two different conditions, rst is the stan- 
dard deviation of MSE(@), and R denotes the least necessary number of replications. 
Under the simulation conditions, when R was set to 50, the right side of Equa- 
tion 16 was always less than 10, which means 10 or more replications are enough. 
Therefore, we consider 50 to be a reasonable and adequate number for this study. 
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Table 3 
Fixed and Manipulated Conditions and Parameter Values in Study 2 


Study 2 Generation Model Model | or Model 4 
Calibration model Model | and Model 4 
Fixed conditions 
Examinee sample size 2,000 
Number of covariates One individual-specific covariate and one 
school-specific covariate 
bux; .eiorj¥[u” | The same as those in Study | (Table 2). 
Manipulated conditions 
Group size Small: 10 
Large: 200 
Test length Small: 10 
Large: 30 
a, Item discrimination 2PNO: a; ~ log N(Wa = expC), og = .25) 
1PNO: a, = 1 


Geweke’s (1992) convergence diagnosis method was used to diagnose conver- 
gence. Three thousand iterations were treated as the initial phase; after that, under 
Geweke’s approach, for a given parameter, a z score, which is defined as the dif- 
ference between the first n4 (n4 = 1,000) and the last ng (ng = 1,000) iterations is 
computed as evidence of convergence, 


64 — 68 
ina! Se(0)4 — n'5'S0(0)? 


£6 = 


(17) 


where 6 denotes the sample mean of @ and $,(0) denotes the consistent spectral den- 
sity estimate. The z score tends to follow a standard normal distribution as n — oo 
(Geweke, 1992). Hence, a z score less than 1.96 implied parameter convergence. 

Through all the conditions, the Markov chain stabilized after 5,000 iterations. 
Hence, a chain length of 10,000 iterations with a burn-in of 5,000 is chosen rea- 
sonably for this study. We sampled one out of 20 points from the sampling phase to 
calculate the model selection criteria. 


Result of Study 1 


Table 4 presents the proportion of correct model selection for each of the five 
variants of DIC in Study |. The values in the table indicate, out of the three fitted 
models, how often each of the indices selected the true model. 

DICc¢, based on the measurement model, chose the correct model for both test 
lengths and both group sizes when Model | or 3 was the generation model. However, 
when the data were generated from Model 2, when the test length was small, DICc 
chose Model 1 44% of the time and Model 2 56% of the time for the small group 
size, and it chose Model 1 58% of the time and Model 2 42% of the time for the large 
group size, and when the test length was large, DIC; chose Model 1 46% of the time 
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and Model 2 54% of the time for the small group size, and it chose Model 1 54% 
of the time and Model 2 46% of the time for the large group size. This observation 
indicates that DIC¢ cannot easily distinguish models that differ by person-level 
covariates. 

DICs, based on second-level marginal likelihood, chose the correct model for both 
test lengths and both group sizes when Model 3 was the true model. When Model 1 
was the generation model for both test lengths, DICs could choose the correct model 
for the large group size; however, for the small group size, DICs tended to select 
Model 2 predominately (with the proportion of selection higher than .90). When 
the data were generated from Model 2, for small test length, DIC; chose Model 1 
18% of the time and Model 2 82% of the time for the small group size, and chose 
Model 1 and Model 2 with the same probability for the large group size, and for 
large test length, DICs chose Model 1 24% of the time and Model 2 76% of the time 
for the small group size, and chose Model 1 42% of the time and Model 2 58% of 
the time for the large group size. It appears from the results that Models 1 and 2 
are relatively difficult to differentiate using DICs, and DICs tended to favor Model 2 
when the group size is small. 

DICz, based on the top-level marginal likelihood, chose the correct model when 
the true model was Model 3. When Model | was the true model, DIC; chose 
Model | 2% of the time and Model 2 98% of the time for the small test length and 
the small group size, for large test length, DIC; chose Model | 16% of the time and 
Model 2 84% of the time for the small group size, and for both test lengths and the 
large group size, it could choose the generation model with probability 1. When the 
data were generated from Model 2, the results for both test lengths were the same. 
DIC; selected the true model 84% of the time for the small group size, and it chose 
Model 2 only approximately one-third of the time for the large group size. In other 
words, when Model 2 is the true model and when the group size is large, DIC; cannot 
distinguish the three models very well. 

Based on second-level joint likelihood, when the true model was Model 1 or 
Model 3, DICj; could choose the true model with probability higher than .98. When 
the data were generated from Model 2, when the test length was small, DICjs chose 
the true model 82% of the time for the small group size, and for the large group size 
it chose Model 1 54% of time and Model 2 46% of the time, and when the test length 
was large, DICjs chose the true model 72% of the time for the small group size, and 
for the large group size it chose Model 2 52% of the time. Overall, DICj; can almost 
select the true model for all manipulated conditions. 

When Model | was the generation model, DICj7, based on top-level joint likeli- 
hood, chose Model 2 with probability | for the small test length and the small group 
size, chose Model 1 as the best-fitting model 32% of the time and chose Model 2 
54% of the time for the large test length and the small group size, and for the large 
group size it chose the correct model with probability larger than .94. When the true 
model was Model 2 or Model 3, DICjr chose Model 3 with probability higher than 
.09 for the small test length except when the test length was small and the group size 
was larger; under that condition, DICjr chose Model 3 with probability .54 when 
response data were generated from Model 2 and with probability 1 when data were 
generated from Model 3, and it chose Model 3 with probability higher than .72 for 
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Table 5 
Proportion of the 95% P.C.1.s of y = [Yoo Yor, Y10, Yi1] That Contain Zero in Study 1 


Generation Model 


. ; Model 1 Model 2 Model 3 
Calibration 

K J Model Yoo Yor Yio Yit Yoo Yor Yio Yu Yoo Yor Yio Yu 
10 #10 Model 1 52 84 54 0 44 98 46 96 1 1 1 1 
Model 2 68 — 0 - 30 - 26 —- 1 - 1 - 

Model 3 46 - - - 20 - - - 1 - 
200 Model 1 0 0 0 0 0 86 OO 82 O 1 1 1 
Model 2 0 - 0 - 0 - 0 - 0 - 1 - 

Model 3 1 - - - 0 - - - 0 - 
30.—=—s 10 Model 1 60 84 42 0 64 96 48 98 1 1 1 1 
Model 2 64 — 0 - 06 - 0 - 1 - 1 - 

Model 3 62 — - - 06 —- - - 0 - 
200 Model 1 0 0 0 0 0 94 0 84 0O 1 1 1 
Model 2 02 —- 0 - 0 - 0 - 0 - 1 - 
Model 3 02 —- - - 0 - - - 0 - - - 


Note. K = test length; J = group size. ‘-’ means null. 


the large test length. According to the results, Models | and 2 are relatively difficulty 
to differentiate using DICj;, and DICj; cannot easily distinguish models that differ 
by school-level covariates. 

As shown in Table 4, the test length cannot affect the model selection by DICs. 
When Model | was the true model, for a small group size DIC¢ and DICjs performed 
best, and for a large group size all variants of DIC performed well. When Model 2 
was the true model, all variants of DIC had low sensitivities for the large group 
size, DICs and DICjs could choose the true model with probability higher than .50, 
and DICc¢ and DIC7 performed similarly to DICs and DICjs; when the group size 
was small. When Model 3 was the true model, all variants of DIC could choose the 
correct model with probability higher than .82. 

Table 5 presents the proportion of the 95% P.C.I.s for school-specific regression 
coefficient y which contain 0 in Study 1. We consider the proportion of the 95% 
P.C.Ls that contain 0 as an alternative model selection index to evaluate whether 
the inclusion of covariates is needed in the model. This is because Models | to 
3 differ essentially on whether a certain covariate is included in the model. As 
shown in Table 5, the structural multilevel model of the generation model can be 
chosen with probability higher than .82 with the proportion of the 95% P.C.I.s that 
contain 0. 


Result of Study 2 


Table 6 presents the correct model selection proportion in Study 2. When the data 
were generated from Model 1, DICc¢ and DICjs could choose the correct model with 
probability higher than .68, and the correct model could be selected by DICc, DICjs, 
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Table 6 
Correct Model Selection Frequency of Study 2 


Generation Model 
Coiieaton DICc DIC; DIC; DICjs DICjr 
K J Model MI! M4 MI M4 M1 M4 M1 M4 MI M4 


10 10 Model 1 92 = .22 0 0 0 0 84 34 42 ~~ 44 
Model 4 08 .78 1.00 1.00 1.00 1.00 16 66 58  .56 

200 Model 1 86 = .44 0 0 04 10 68 46 54 450 
Model 4 14 56 1.00 1.00 90 90 32 54 46 «50 

30 =—10 Model 1 1.00 .04 0 0 0 O 1.00 .04 1.00 .56 
Model 4 0 96 «61.00 1.00 1.00 100 0 96 0 44 

200 Model 1 1.00 12 0 0 14 .24 1.00 .10 1.00 42 
Model 4 0 88 1.00 1.00 .86 .76 0 90 0 58 


Note. K = test length; J = group size; DICc = conditional DIC; DICs = second-level marginalized DIC; 
DIC7 = top-level marginalized DIC; DICjs = second-level joint DIC; DICj7 = top-level joint DIC; 
M 1 = Model 1; M 4= Model 4. 


and DICjr with probability | for the large test length. When Model 4 was the gener- 
ation model, all indices except DICj; could choose the generation model with prob- 
ability higher than .54, and DICj; could choose each model with similar probability. 
It appears that longer test length helps distinguish the number of item parameters in 
measurement models. 


Real-Data Illustration 


In this section, we present a real-data example to illustrate the application of 
the DIC indices. A data set from the Program for International Student Assess- 
ment (PISA) 2012 assessment (http://www.oecd.org/pisa/data/pisa20 1 2database- 
downloadabledata.htm) was analyzed. 


Data Source 


The PISA, collected by the Organization for Economic Co-operation and Develop- 
ment (OECD), is conducted to assess students’ performance and explore the effects 
of student and institutional factors on student performance. In 2012, 65 countries 
participated in the assessment, and the survey covered mathematics, reading, sci- 
ence, and problem solving. In this section, we will focus only on the mathematics 
test given to 15-year-old United States students in 2012. The data set contains 4,978 
students from 162 schools, and each student responded to 49 multiple choice items 
scored dichotomously. In addition, we chose the indices of economic, social, and 
cultural status (ESCS) and school location as individual-level and school-level co- 
variates, respectively. For data cleaning, we deleted all schools in which there were 
fewer than 10 students. The resulting sample size entered into the final analysis was 
4,882 students from 154 schools. 
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Table 7 


Model Comparison Results for the PISA Example 


DICc(Rank) DICs(Rank) = DIC;(Rank) — DICjs(Rank) ~— DIC (Rank) 
Model 1 91166.62(1) — 52,9868.71(5)  52,9904.25(5) 89,426.13(2) 11,4476.65(4) 
Model 2 — 91,167.25(2) — 52,9937.53(6) 52,9958.12(6) —89,450.28(3) 11,4275.47(3) 
Model 3 91,563.14(3) 52,8752.41(4) —52,8709.88(4) 88,273.01(1) —_78,486.09(1) 
Model 4 93,343.43(4)  52,3447.23(2) —52,3607.71(2) —93,913.58(5) — 12,0908.47(5) 
Model 5 —93,348.21(5) —52,3454.88(3) —52,3618.39(3) —93,913.65(6) — 12,0996.91(6) 
Model 6 93,977.81(6) 52,2507.45(1) 52,2553.78(1) 93,507.10(4)  83,857.69(2) 


Note. DICc = conditional DIC; DICs = second-level marginalized DIC; DIC7 = top-level marginalized 
DIC; DICjs = second-level joint DIC; DICj7 = top-level joint DIC. 


Table 8 
95% P.C.L.s of y for the PISA Example 


Est. PCI. Est. PCI. 
Model 1 yoo —.0358  [—.0959,.0239] Model 4 yop +—.0291 [—.0905,.0341] 
Yor —.00018 [—.0192,.0160] Yo. —.0039 [—.0222,.0147] 
Yio 0192 [~.0420,.0792] Yio 0474 [—.0209,.1129] 
yi, —.0019 — [—.0196,.0163] yi 0031 [—.0217,.0165] 
Model 2 yo) —.0416  [—.0630,—.0200] Model yoo —.0416 [—.0637,—.0188] 
Yio 0131 ~~ [~.0081,.0344] Yio 0368 ~—‘[.0145,.0599] 
Model 3 yoo —.037  — [~.0592,—.0156] Model6 yoo —.0323  [—.0548,—.0107] 


Estimation of Model Parameters 


Six models were applied to the PISA data; they were Models | to 4, described 
in the simulation studies, in which ESCS was considered as the individual-level co- 
variate, and school location was considered as the school-level covariate, along with 
two additional models: the 1PNO model as the measurement model, and a structural 
multilevel model similar to Model 2 and Model 3. For the MCMC algorithm, accord- 
ing to Geweke’s convergence criterion, a conservative burn-in of 5,000 iterations and 
5,000 post-burn-in iterations was used here. 


Results 


Model selection results for the real-data illustration are given in Tables 7 and 8. 
As noted above, the smaller the DIC, the better the model—data fit. Meanwhile, a 
popular rule of thumb for model comparison (e.g., Spiegelhalter et al., 2002), is that 
a difference of 2 or less is considered negligible, a difference between 3 and 7 pro- 
vides positive support for the model with a lower value, and a difference exceeding 
7 constitutes strong support. 

As shown in Table 7, because the difference in the DICc values between Model 1 
and Model 2 was less than 1, the two models are nearly equally favorable based on 
DICc. Model 3 was selected by DICjs, and DICj7 and Model 6 was selected by DICs 
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and DIC;, because they generated the smallest DIC value. Table 8 provides the 95% 
P.C.Ls for the coefficient y. As seen in Table 8, when comparing Model | to Model 
3, it appears that the 95% P.C.I.s of y contained 0 in Model | and Model 2, indicating 
that these parameters are not significantly different from 0. Conversely, the P.C.I.s of 
Yoo in Model 3 did not contain 0, implying that this parameter must be included in the 
model. Taken together, Model 3 is the preferred model among the three fitted models, 
of which 2PNO models are the measurement models. When comparing Model 4 to 
Model 6, this observation indicates that the 95% P.C.I.s of y contained 0 in Model 4, 
indicating that these parameters are not significantly different from 0. Conversely, the 
P.C.I.s of yoo in Models 5 and 6 and the P.C.I.s of yio in Model 5 did not contain 0, 
implying that those parameters must be included in the model. Taken together, Model 
5 is the preferred model among the three fitted models of which the 1PNO models 
are the measurement models. Considering the results from all indices, it appeared 
that Model 3 may be the best fit for the data. 


Discussion 


The present study was motivated by two observations. First, there is a lack of ef- 
fective model selection criteria for multilevel IRT models. Because the MCMC sam- 
pling method is often used to estimate multilevel IRT models, DIC becomes a natural 
option for evaluating the global model fit. Second, for multilevel models, DIC can be 
constructed differently depending on (1) whether joint or marginal likelihood is used 
and (2) the target parameters of interest. There is currently not enough information 
on which version of DIC is recommended for multilevel IRT models when different 
measurement models are considered and when various levels of covariates are in- 
cluded in the model. The main purpose of this study was to examine the accuracy of 
five DIC-based indices in the selection of a best-fitting MLIRT model. 

Across all simulation conditions, second-level joint DIC is recommended for 
MLIRT models because it almost selects the correct model and is computationally 
less demanding than some other alternatives. Conditional DIC sometimes tends to 
choose a more complex model for a large group size. Second-level marginalized 
DIC and top-level marginalized DIC perform similarly; they tend to choose a sim- 
pler model for a small group size. For a large group size, top-level marginalized DIC 
performs worse than second-level marginalized DIC. Because the numerical com- 
plexity of top-level marginalized DIC is far more demanding, top-level marginalized 
DIC is not recommended for all conditions for multilevel models, and top-level joint 
DIC tend to select a simpler model regardless of the group size. 

Additionally, we conducted a sensitivity analysis and found that the computed 
joint DICs showed more variation than the marginalized DICs. This happened be- 
cause, when to calculating the joint DICs, the point estimates of individual 8 and & 
were plugged in and these point estimates are prone to measurement errors. How- 
ever, when calculating the marginalized DICs, the influences of random effects were 
integrated out in the marginal likelihood. That is the reason why the joint DICs have 
more variation, especially the top-level joint DIC. 

Because this study was a preliminary investigation of the comparison of joint DIC 
and marginalized DIC for MLIRT models, there are several limitations. First, from 
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the empirical perspective, only one individual-specific covariate and one school- 
specific covariate were considered in the real-data illustration; future research should 
assess the effect of more covariates at both the individual level and school level. Sec- 
ond, the performance of DIC-based indices in this study was focused on a multilevel 
data structure at only one time point. Future research surrounding the model selection 
method for MLIRT models can be expanded to select multilevel models for longi- 
tudinal data. Finally, as warned by Celeux et al. (2006) and Li et al. (2009), DIC 
was less accurate with mixture models; one might wish to investigate the restrictions 
under which some type of DIC can perform well for multilevel mixture models. 

As a point of reference, it should be noted that the computation of top-level 
marginalized DIC is extremely time-consuming. Several days of CPU time was re- 
quired on a 3.20 GHz desktop PC to compute per condition, per replication using 
MATLAB 2013a. Other versions of DIC required less than 10 minutes. 
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Appendix 1: MCMC Algorithm for the MLIRT Model 


In this appendix, the MCMC algorithm used in this article is described briefly. In- 
terested readers can refer to Fox (2010) for a full description of the MCMC algorithm 
for the family of MLIRT models. 

The full posterior distribution of the parameters given the data is given by 


JON; K 
p(Z, Q, g, B, 0°, Y, T ly, X, Ww) oe I] IT Il P(Zijk |9,; Ek Vijk ) pO Bj. o, X; ): 
jeli=l \k=1 (Al) 
P(B; 


y. T, W; pv IT) p&)p*) pC) 


Step 1: Sampling z. Given the parameters 0 and &, the variables z;, are indepen- 
dent, according to the definition of z;, it follows that 


P(Zijk [Oss Ses Vie) © W(Zijes 49 — De, DU Gin > OO = D 
+ T(Zijx < OTe = 9)I, (A2) 


where y(-; a,9;; — bx, 1) denotes the normal density with a mean equal to a,0j; - by 
and a variance equals to one, and /(-) is an indicator variable which equals to one if 
its argument is true, and equals to zero otherwise. 

So the fully conditional posterior density of z;, is given by 


N(ax9j — bx, 1) truncated at the left by 0 if yj, = 1 


N(a,x9;; — b,, 1) truncated at the right by 0 if yj, = 0° (3) 


aey~| 


ijk 
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Step 2: Sampling 0. The ability parameters are independent given z, &, B and 07. 
Its full conditional posterior density is given by 


6;/v + x48 j /0? 1 
0; |Z, &, Bj, 2 N d ha) , , A4 
ii |Zij. 5, By, o ( 1/v + 1/0? L/v + 1/02 (A4) 
with 

A. _ aa ak (Zijk + Bx) _ eg =1 

9ij = ae a » V= ‘oem a?) . (A5) 
Step 3: Sampling §. Let E = [6, —1], therefore 

€,10, Ze ~ N(Ex, (EE) ')I(a, > 0), (A6) 


with &, = (E'E)~'EZ,. Hereafter, the superscript ¢ denotes the transposition of a 
matrix. 
Step 4: For each j, sample 8; from the full conditional 


Bj |9;.0°, y, T ~ N(ad, D), (A7) 
where D = (£7'+T"!)"! and d= £516; +T 'w;y, with Dj = 0°(x'x;)! and 


B; = (xix) |x‘ 6). 
Step 5: Sample y from the full conditional 


aT = Re aie ne ee | 2° gest 
y|Bj.T ~ N ‘coe wT ©) as wT Bj, (Oe wT o)) .  (A8) 
Step 6: Sample o” given 0 and B from the full conditional 
o° |0,B ~ Inv — x?(N, S°), (A9) 


where S? = 4 )-j_, (0; — x/Bj)' (0; — xjB))- 
Step 7: Sample T from the full conditional 


T|8, y ~ Inv — Wishart(J, S;'), (A10) 
where $7! = 0¥_, (Bj — @j)(Bj — @; 7)". 


Appendix 2: Calculation of Variants of DIC 


The First-Level Conditional DIC 
The density function of z given (€, 0) in Equation 7 can be expressed as 


1 2 
—Knj/2 
pe Lea en 5 2 ete ee) ie BD 
where n, is the number of examinees in group j. 0;; is normally distributed with vari- 
ance Qy = (a'a+o-7)~! and mean j1p = Qo(a' (zij + Db) + o °xiBj). a is the vector 
of discrimination parameters, b is the vector of difficulty parameters, and z; and 
x;; are the vectors of the augmented data and the individual-specific covariates of 


examinee i in group j, respectively. 
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Hence, the augmented likelihood, focused on the parameters from level 1, used to 
calculate the posterior mean of the deviance is expressed as 


, r 1 r r r)a(r 2 
p (2 |g, 0°) =] [xy exp {-5 (ee +o? - aff?) . (B2) 
J 


Hereafter, the superscript r denotes the interim value from the rth post-burn-in 
iteration. 
In contrast, the deviance evaluated at the posterior mean 0 can be expressed as 


? 1 es 
£,6) =] [ 2)? exp {3 Dd Gi + be - a) | = ABS) 


j i,k 


p(z 


Hereafter, the hat denotes the final point estimate value based on the Gibbs sam- 
pler, and 2 denotes the augmented data based on a, b, and 6. 


The Second-Level Marginalized DIC 

Let Q = (&, 0”, B) denote all model parameters of interest for DICs, where & 
denotes the matrix of item parameters, o” denotes the variance of level 2 random 
effects (see Equation 2), and B denotes the matrix of regression coefficients in level 2. 
The method to calculate the second-level marginalized DIC is similar to Fox (2010). 
Interest is focused on the augmented likelihood in Equation 9, 


p@,9|2) _ plalé, ®)p@|o*, B) 
p@|2, 2) pO |2, 2) 


where p(0|z, &2) is the full conditional probability density function, the density func- 
tion of z given (€, @) is the same as that in Equation B1, and the density function of 


@ given (07, B) can be expressed as 
t 
(6; = x/B;) (6; = x‘B;) 


20? 


Pale, 07, B) = p(z|Q2) = . (BA) 


p (0|o?.B) =] ] 2x0?) *"” exp (B5) 
j 


Hence, the augmented likelihood used to calculate the posterior mean of the deviance 
follows that 


(r) 


. (2 ja”) = anr(S) ox {-s” (0) /2] ; (B6) 


where 
r r r r rar 2 207) [ alr n\ r r 
5 (0) => (eo +4? — af) +07 (0 — x81) (07 — x89”). BT) 
i,k 


However, the deviance evaluated at the posterior mean © follows that 


A 


nj [2 
A 5 [ &te \ AA 
p(@)|2)=am-™2(2) exp[-8@)/2}. BS 
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where 


5 (6)) = 0 (Gin + be — a6)" + 6-7(8; — x)8,)' 6) - x16). B9) 


The Second-Level Joint DIC 
The joint likelihood of all parameters of interest from levels 1 and 2 is written as 


p@, 912) =T] v (eisai — 61. 1) TT ¥ (00s x18, 0), B10) 
i,j,k ij 


where 1y(- ; |, 7) is the normal density function with mean 1 and variance o”. More- 
over, the deviance evaluated at the posterior mean &2 follows that 


p@, 6 (2) = TT v (2s 85 — Be, Ty (3, x0 ee: (B11) 


i,j,k 


The Top-Level Marginalized DIC 

Let A = (&, 0”, y, T), where T denotes the covariance of level 3 random effects 
(see Equation 3), and y denotes the vector of regression coefficients in level 2. The 
method to calculate the top-level marginalized DIC refers to Fox’s method (2010). 
Interested readers can refer to Fox (2010, pp. 190-191) for a detailed description. 

Based on Bayes’s formula, the augmented likelihood in Equation 13 can be ob- 
tained as follows: 


A PB ly, T) 
T) = p@|A) = p@|p, A)—  _. (B12) 
PB |z, A) 
The first part of Equation B12, the conditional augmented likelihood given (f, A), 
is derived as 


p (zip, A)= [1 @xy oo. Zip + Dy — a,9;) | 


exp{-2 552 (9; — xjBj)' (9; - x;B,)} 


where 0;; is normally distributed as in Equation B1. 

Then, the density function of B; given (y, T) is a multivariate normal probability 
density function with mean @; y and covariance matrix T. The conditional distribu- 
tion of B; |z;, A is multivariate normal with mean 


E(B; |2;, A) = jy + (Px; @ a')H; (2; — Kja;y@a—1,,@b)), (B14) 
and covariance matrix 


= Var(B; |zj, A) = T + (Px! @ aH; (xT @ a), (B15) 


pz 


(B13) 


where H; toy i Tx; ® aa’ +1,,® (o?aa’ + Ix). 
The devinnee evaluated using the posterior mean, obtained by performing (B12), 
is given by 


W\% /2 
(r) (r) —Knj;/2 on (r) 21S 
P\Z; |A? ) = Qn) ™ x) oe 


0] exp {— B (0, BY) (2h, (B16) 
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where 


2 
(rr) (am) a) (r) 5” (rya(r) \~ -207) ( air) (r) (r) (r) 
B® (0, BP) = S> (22 +0 - ao?) +02 (a? — x8) (0? — 2,84 


i,k 


4 (8)? - oy?) (6? - wy" ey (B17) 


Moreover, the deviance evaluated at the posterior mean A follows that 


A 


nj/2 
p (@|A) = @m-m?( =) |)" |S)! exp {-B (6;.B;) 2}, B18) 
where 
B (6;,B;) = Yo (Gin + be — &6;)° + 8-76; — x)6,)' (6; — x8) 


A 


+ (8; — @;;)' (6; — @)%)). (B19) 


The Top-Level Joint DIC 
The top-level joint likelihood used to calculate the posterior mean of the deviance 
in Equation 15 is then 


p (2,8, 8 a) = TT w (zitsal9y? — 29°, 1) 


i,j,k 


I] wv (017: xi8)?. 0%”) Wavy Ge oy”, ae (B20) 
i,j 


where Wryyv(-;, 2) denotes the multivariate normal density function with mean w 
and covariance matrix ©. Moreover, the deviance evaluated at the posterior mean A 
follows that 

P (2, 0, B |A) = I] y (253 494) a by. 1) I] w (63x16, 6°) Wavy (8,3. ;9,T). (B21) 


ijk ij 
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